Method of transmitting or storing digitalized, multi-channel audio signals

ABSTRACT

In order to transmit or to store digitalized, multi-channel audio signals which are digitally represented by a plurality of spectral subband signals, at an encoder, subband signals of different audio channels but having the same frequency position are combined across channels according to a dynamic control signal. This control signal is derived from several audio channels by an audio signal analysis based on a psycho-acoustic binaural model. At a decoder, the subband signals of several audio channels but having the same frequency position are dissociated across channels according to a control value derived from the dynamic control signal, transmitted or stored therewith.

BACKGROUND OF THE INVENTION

The invention relates to a method of transmitting or storingdigitalized, multi-channel audio signals. More particularly, one aspectof the invention is directed to a transmitting or storing method of thetype in which, on the encoder side, (a) each audio channel isrepresented digitally by a number of spectral subband signals, whereinfor each subband signal quantized sampling values that are discrete intime are present; (b) the quantization of the sampling values is changed(coded) in the individual subbands according to the channel-specific,intra-channel perceptual threshold of human hearing in the sense of areduction in irrelevance; (c) for each subband signal a scale factor(SCF) is determined that classes the peak value of the subband signalwithin a certain time interval (block) and that is used in thenormalization of the subband signal prior to quantization; and (d) forthe control of the quantization of the sampling values of each subbandsignal, a piece of bit allocation information (BAL) is obtained from anaudio signal analysis and is stored or transmitted together with thedetermined scale factors (SCF) and the coded subband signals, and inwhich, on the decoder side, (e) the coded subband signals are decodedaccording to the bit allocation information (BAL) and the subband signallevels are denormalized according to the scale factors (SCF); and (f)the decoded and denormalized subband signals of each audio channel arecombined into one broadband, digital audio signal.

For transmitting and storing digital audio signals, it is known (DE 3440 613 and DE 36 39 753) to perform a data reduction with the aid of acoding in subbands, and with the use of a psychoacoustic model. In thesecases the digitalized, broadband audio signal is represented by a numberof spectral subband signals, and the quantization of the sampling valuesis effected in the individual subbands according to the perceptualthreshold of human hearing. The data reduction factor for veryhigh-value audio signals that can be achieved with this "monophoniccoding" has approximately the value of 6 at a 46 kHz sampling frequencyand 16-bit linear quantization, which corresponds to a data rate of2×128 kbit/s per stereophonic audio signal.

Also known are measures for increasing the data reduction factor duringcoding of dual-channel, stereophonic signals, which are based onsuppressing redundant signal components in the left and right channel,as well as such components which are irrelevant with regard to thestereophonic effect. Two different embodiments of this "stereophoniccoding" are described in Netherland Patent Application No. 90 00 338 andin "Proceedings of the ICASP 1990" Perceptual Transform Coding ofWideband Stereo Signals,"namely

periodic transmission of subband composite signals in the upperfrequency range, and block-wise reconstruction of the subband signallevel in the left and right channels with the aid of scale factors whichrepresent the maximum signal level in the subbands of the left and rightaudio channels, and

creation of sum (M=MONO) and differential (S=SIDES) signals from theleft (L) and right (R) audio signal in accordance with the matrixingM=L+R and S=L-R, where the decoding of the sum signals and differentialsignals is effected according to the perceptual threshold determined forthe sum signal M.

If the separate coding in the subbands of the left and right audiochannels is supplemented by stereophonic coding in such a way that oneof these methods or a combination of the two methods is employed, thedata reduction factor of the value 6 can be increased to approximatelythe value 8. If, for example, a bit rate of 256 kbit/s is required forthe transmission of two independent monosignals of the same quality as a16-bit linear quantization, during stereophonic coding only 192 kbit/sare required for the transmission of a dual-channel stereo signal of thesame subjective quality.

SUMMARY OF THE INVENTION

In contrast, the object of the invention is to achieve an even moresubstantial data reduction in dual-channel or multi-channel stereophonicsignals.

This object is accomplished, according to one aspect of the invention,in that on the encoder side, the subband signals of different audiochannels, but which have the same frequency position, are combined in aninterchannel manner according to a dynamic control signal (COS) which isobtained by means of an audio signal analysis of a plurality of audiochannels that is oriented toward a binaural, psychoacoustic model, andin that, on the decoder side, the subband signals of different audiochannels, but the same frequency position, are decombined as a functionof a control variable (COM), which is derived from the dynamic controlsignal (COS) and also transmitted or stored.

The invention is based on the consideration that the bit rate for asubband-coded, stereophonic signal can still be lowered considerablywhen diverse phenomena of spatial hearing and known models of binauralsignal processing in human hearing are considered. Namely, it has beenseen that a high information component of the stereophonic signal isirrelevant. In particular, human hearing can detect no spatialdifferences for certain spectral components and within certain timeintervals, i.e. within certain time intervals a complete channelseparation is not necessary for certain spectral ranges. Moreover, as aconsequence of the effect of "adjacent-channel masking," thequantization or suppression of individual subband signals can beeffected according to the highest adjacent-channel perceptual thresholdwhen the associated common-channel perceptual threshold is lower.However, for the use of adjacent-channel masking, the effect of areduced masking in spatially-separated masking ("Masking LevelDifference," MLD) must be considered, and thus the playback arrangementmust be defined. Such an optimization of stereophonic coding will beparticularly significant in future digital audio systems having a highernumber of transmission and playback channels. With the increase in audiochannels, the directional stability of the audio representation in theimage region between the front loudspeakers, and the possibilities ofspatial audio representation increase. For this purpose, for example,the left and right loudspeakers are supplemented by a center loudspeakerand two surround loudspeakers, so that three further tone-transmissionchannels are necessary. In many cases, a bit rate that increasesproportionally with the increase in the number of channels represents atoo-high stress of the transmission capacity. For example, a doubling inthe bit rate would already be unacceptable for additional centerchannels and surround channels in future digital audio broadcasting ifthe number of programs would have to be halved correspondingly.

Therefore, it is particularly desired to reduce the bit rate forfive-channel, stereophonic audio signals (3 front signals L (left), C(center), R (right) and 2 surround signals LS (left-surround), RS(right-surround); abbreviated "3/2 stereo signals" of 5×(192:2)=480kbit/s. An important condition for introducing multi-channel audiotransmission systems is, in many case, compatibility with presentdigital, dual-channel stereo receivers. Because the transmissioncapacity is often too limited to transmit a complete multi-channelsignal in addition to the conventional dual-channel signal according toISO/EMPEC Standard 11172-3 corresponding to the simulcast principle, adual-channel base signal Lo, Ro must be obtained with the aid ofcompatibility matrixings prior to transmission out of the multi-channelstereo signal, which is suited for playback with dual-channel stereoreceivers. The following system of equations is provided as an examplefor a compatibility matrixing for five-channel (3/2) stereo signals:

    T1=L+xC+yLS=Lo                                             (1)

    T2=R+xC+yRS=Ro                                             (2)

    T3=xC                                                      (3)

    T4=yLS                                                     (4)

    T5=yRS                                                     (5)

where x and y are coefficients in a range of, for example, 0.5 to 0.8.

The transmission signals T1 and T2 form the dual-channel stereo basesignal packet Lo/Ro, while the additional transmission signals T3, T4,T5 contain the information required for dematrixing. If the originalfive-channel stereo signals L, C, R, LS, RS are supposed to becompletely reconstructed on the decoder side, the dematrixing guide isas follows:

    L'=T1-T3-T4                                                (6)

    R'=T2-T3-T5                                                (7)

    C'=T3/i x                                                  (8)

    LS'=T4/y                                                   (9)

    RS'=T5/y                                                   (10)

A complete reconstruction of the original five-channel stereo signalsis, however, not necessary, because--as already mentioned--hearingtolerates an incomplete channel partitioning for certain spectral rangeswithin certain time intervals. Correspondingly, certain subband signalsof the transmission signals T3, T4 and T5 in equations (3), (4) and (5)can be periodically set at zero in a signal-dependent manner (so-called"puncture coding"). A further consideration is to decrease the data ratefor the additional signal packet T3/T4/T5 as low as possible in that thesignal components relevant for spatial perception are exclusivelyextracted. The coding of the stereo base signals Lo and Ro remainsunaffected by this, so that compatibility with existing dual-channeldecoders is ensured. Instead of a multi-channel audio representation,multilingual, simple programs or comment channels can also be provided,particularly for use in future television systems that have digitalsound. In this case no matrixing is provided.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention is described in detail below in conjunction withembodiments in the drawings, in which:

FIG. 1 is a basic diagram of an encoder for encoder-side execution ofthe method according to the invention;

FIG. 2 is a basic diagram of a decoder for decoder-side execution of themethod according to the invention;

FIGS. 3 and 5 are basic diagrams of compatible encoders forfive-channel, stereophonic audio signals;

FIGS. 4 and 6 are basic diagrams of compatible decoders forfive-channel, stereophonic audio signals, and

FIG. 7 illustrates a model of the binaural signal processing of humanhearing.

DESCRIPTION OF THE PREFERRED EMBODIMENTS

In the invention, in addition to redundancy, above all the limits ofhearing are utilized as fully as possible in the spatial analysis of theaudio field in the sense of a stereophonic coding, in that the bitallocation for individual subbands of individual audio channels isperformed block-wise in accordance with a common analysis of thestereophonic signals of all audio channels. Here the quantization is notonly established under consideration of the adjacent channel masking,but also under consideration of the subjectively sufficient channelpartitioning, as will be explained in detail below.

A block-wise reduction of the channel partitioning of individual subbandsignals, which is controlled in a signal-dependent manner and is adaptedto hearing, leads to a corresponding data reduction when the subbandsignals set to "mono" in the encoder are only transmitted in one of thechannels and a suitable exchange of individual subband signals ispossible in the decoder. The relevant subbands of the remaining audiochannels are not transmitted. To this end networks ("combiners" or"decombiners") are provided in the encoder and in the decoder; thesenetworks function according to control information that is alsotransmitted, is created from an encoder-side analysis, and allocates orlinks individual subband signals to individual audio channels in thesame frequency range. As will be explained below, it is particularlyadvantageous to limit the subband-wise and block-wise reduction of thechannel partitioning to the sampling values and to use the originalscale factors of the individual audio channels, which were obtained onthe side of the encoder and transmitted as additional information, forrecovering the subband signal level.

An exemplary basic diagram of an encoder and a decoder for n audiochannels ensues from FIGS. 1 and 2. In the encoder according to FIG. 1,the digitalized, linearly-quantized audio signals S1 through Sn are fedto a subband filter bank 10 and respectively separated into i subbands.The resulting nxi subband signals pass, one behind the other, through acombiner 11, a scaler 12 and a quantizer 13, and are combined, togetherwith the additional information BAL (Bit Allocation Information)obtained in an analysis stage 15, COM (Combiner Mode Information) andSCF (Scale Factors) in a multiplexer 14 to form a multiplex signal MUX.From the analysis stage 15, a dynamic control signal COS is fed to thecombiner 11, which signal controls an interchannel-wise combination ofindividual subband signals. In the scaler 12 the sampling values arenormalized according to the scale factor information SCF, while thequantization of the normalized sampling values is performed in thequantizer 13 according to the bit allocation information BAL. How thecalculation of all additional information for controlling the stages 11,12, 13 is performed by the analysis stage 15 will be described in detailbelow.

In the decoder according to FIG. 2, after the separation of the receiveddata stream MUX, the sampling values of the subband signals arerecovered in normalized form in a dequantizer 23 downstream of thedemultiplexer 24 with the aid of the transmitted bit allocationinformation BAL. In a decombiner 21, the recovered sampling values aredivided subband-wise into n audio channels according to the combinermode information COM. Only after this does a multiplication of thesubband signals with the scale factors SCF take place in multiplierstages 22. After the inverse filtering of the denormalized subbandsignals in the synthesis filter banks 20, the broadband, digitalizedaudio signals S1' or Sn' are present at the outputs of the decoderaccording to FIG. 2.

A particular embodiment of the method according to the invention isrepresented in FIGS. 3 and 4. Considered here is the requirementmentioned at the outset of compatibility with existing subband codersaccording to the digital standard ISO 11172-3, as is the requirement offlexible use of the audio-transmission channels. The point of departurein the case of the encoder according to FIG. 3 is five-channel,stereophonic, digitalized audio signals, namely 3 front signals Left(L), Right (R) and Center (C) and two surround signals Left Surround(LS) and Right Surround (RS). For reasons of compatibility with existingstandard subband coders, the front signals L and R, which arerespectively filtered into i subbands in subband filters 30, arematrixed in a combiner 31, for example according to the above equations(1) and (2), to form a dual-channel, stereo base signal packet Lo'/Ro'which is subband-coded in a known way in a stage 37 designated as a"subband encoder and standard multiplexer" and converted into astandardized bitstream according to ISO Standard 11172-3, forming acompatible stereo signal MUXo.

The center signals and surround signals C, LS, RS, which arerespectively filtered into i subbands in subband stages, are matrixed toa subband-filtered additional signal packet T3/T4/T5 in the combiner 31,for example according to equations (3) through (5). This additionalsignal packet T3/T4/T5 is subjected to a scaling (scaler 32), aquantization (quantizer 33) and a multiplex formation (multiplexer 34)in the same way as in FIG. 1, and the control of the stages 31, 32, 33and 34 is again effected by an analysis stage 35. In comparison to theanalysis stage 15 according to FIG. 1, the analysis stage 35 accordingto FIG. 3 additionally generates difference scale factor information SCFand configuration information CFG, which are inserted into the multiplexsignal MUX of the multiplexer 34. From a user-operated systemconfiguration input 36, the analysis stage 35 obtainsapplication-dependent information about the type of desired processingof the input signals, for example a five-channel data reduction underconsideration of the stereophonic irrelevance of all channels (surroundaudio representation) or a coding of the input signals C, LS, RSindependently of the front signals L, R, for example for a multilingualtelevision sound. In the latter example, LS, RS, C are not surroundchannels, but alternative voice channels, as will be described by way ofFIG. 5. In addition to the n subband-filtered input signals, theanalysis stage 35 obtains the subband-filtered stereo base signals Lo,Ro.

The above-named input signals are analyzed in the analysis stage 35corresponding to the psychoacoustic model used there and the criteriaderived therefrom, which will be described below. If the analysisreveals that a channel partitioning is not subjectively necessary, thechannel partitioning is canceled in that subband signals of theadditional signal packet C/LS/RS, which were determined with the aid ofthe bit allocation information BAL, are set to zero block-wise in thequantizer 33. The creation of the additional multiplex signal MUX in themultiplexer 34 also encompasses source-related error-control measures.The difference scale factor information SCF consists of the differencebetween the scale factors SCFo of the subbands of the stereo basesignals Lo, Ro and the scale factors SCF of the subbands of theadditional signals C, LS, RS, and thus permit a bit-saving coding of thescale factors SCF. Instead of an explicit transmission of the scalefactors SCF in the multiplex signal MUX according to FIG. 1, in theencoder according to FIG. 3 only the difference values SCF aretransmitted.

The resulting bitstreams MUX and MUXo of the encoder according to FIG. 3are transmitted in an arbitrary manner in separate data blocks orstored. At the location of the decoder according to FIG. 4, it must beensured that the two bitstreams MUX and MUXo are availablesimultaneously. It is also possible to transmit or store the resultingbitstreams MUX and MUXo in a higher-order multiplex. For this purpose,the bitstream MUX, for example, as shown in FIG. 5, is fed to the stage37, which inserts the bitstream MUX into the bitstream MUXo in asuitable way, creating the combined bitstream MUXo*. This can beeffected, for example, in a data stream according to ISO Standard11172-3 in that the MUX data are inserted in each audio data frameinside a frame that is reserved for program-related data.

In the decoder according to FIG. 4, the subband signals of thedual-channel stereo base signal packet Lo'/Ro' are recovered in a knownway from the bitstream MUXo in a stage 47 designated as "standarddemultiplexer and subband decoder." Moreover, the subband signals of theadditional signal packet are recovered from the bitstream MUX with theaid of dequantizers 43 and multipliers 42 after completed demultiplexformation in a demultiplexer 44. The necessary scale factors SCF of thesubband signals C, LS, RS are calculated in a control unit 45 from thedifference scale factor information supplied by the demultiplexer 44 andthe scale factors SCFo of the subband base signals Lo, Ro supplied bythe stage 47. As in the case of FIG. 2, the dequantizers 43 arecontrolled by the bit allocation information BAL, and the multipliers 42are controlled by the scale factors SCF calculated in the control unit45.

The decoded subband signals Lo', Ro' at the output of the stage 47 andthe decoded subband signals T3', T4' and T5' at the output of themultipliers 42 reach the decombiner 41, where the already-mentionedchannel allocation or the interchannel-wise linking of the subbandsignals is effected, with control of the control unit 45, whichcalculates the control signal COS from the additional information COMand CFG supplied by the demultiplexer 44 and from an input signal EIN.The input signal EIN is entered by a user by way of a playbackconfiguration input 46; it represents the desired listening situationand--in the case of the multilingual television sound--the desiredplayback language. Furthermore, the user can enter data about levelbalance and listening dynamics into the control unit 45 by way of theinput 46, and, according to these entries, the control unit controls theweighting stages 48, which are downstream of the decombiner 41 andmultiply the subband signals at the outputs of the decombiner 41block-wise with corresponding weighting factors GEW. After subsequentinverse filtering in the synthesis filter banks 40, the broadband,digitalized audio signals L', R', S3', S4' and S5' are present at theoutputs of the decoder according to FIG. 4. In contrast to the decoderaccording to FIG. 4, in the decoder according to FIG. 6 the data streamMUX and the scale factors SCFo of the subband base signals Lo, Ro areseparated out of the combined data stream MUXo* in the stage 47 and fedto the demultiplexer 44. The direct feeding of SCFo from the stage 47 tothe control unit 45 (FIG. 4) is omitted; rather, in FIG. 6 the scalefactors SCFo are fed to the control unit 45 from the demultiplexer 44.Also, the decoder according to FIG. 6 is identical to the decoderaccording to FIG. 4.

For a detailed explanation of the method according to the invention byway of the embodiments according to FIGS. 3 through 6, first the hearingproperties forming the basis of stereophonic coding will be illustratedand, based on this, the measures used for data reduction. Furtherembodiments, particularly with regard to a flexible utilization of thetransmission channels, are described last.

Model of Binaural Signal Processing

For spatial hearing, differences between the acoustic signals at bothears (interaural signal differences) has tremendous significance. In theliterature, a plurality of models are distinguished which already permita comprehensive description of binaural effects. However, the knowledgeof the underlying physiological processing mechanisms decreases thefurther one advances into the central neuronal regions. For this reasonthe peripheral stages of binaural models are uniform at least in theirbasic conception, whereas the central model areas either specialize innarrowly-limited, psychoacoustic effects or permit only imprecisestatements. A model of binaural signal processing is shown in FIG. 7.The relatively precise description of the peripheral stage by way of thefunctional blocks A, B, C, D, as well as general assumptions about themode of operation of the central processing stage ("binaural processor")permit a crude acquisition of the "binaural information." The modelaccording to FIG. 7 includes the following stages:

Outer-Ear Filter:

The effect of the outer ear (pinna, head, torso) on the left and rightear signals is described by way of outer-ear transmission functions(outer-ear filter"). By means of this, a spatial separation as well as aspectral and temporal filtering of the source signal are effected in thesense of a "spatial coding." The effect is negligible belowapproximately 150 Hz, i.e. correspondingly low-frequency source signalsare irrelevant with respect to binaural evaluation.

Peripheral Stage A:

It has not only been proven for monoaural, but also for binaural signalprocessing that the evaluation of each ear signal takes place in afrequency-selective manner. The hearing separates the ear signals intonarrow-band components f1, f2 . . . fn. The analysis bandwidth of therelevant narrow-band filter corresponds to the frequency group widthsthat are measured in monoaural experiments, i.e. the peripheralfrequency selectivity that has also been proven physiologically becomeseffective in both monoaural and binaural signal processing.

Peripheral Stage B:

The following model component B for simulating the neuronal propertiesis also physiologically and psychoacoustically founded. The outputsignals of the hair cells which sample the mode of oscillation along thebasilar membrane are consequences of neural impulses (spikes) whosetemporal density increases in the relevant frequency band with anincreasing signal level ("pulse frequency modulation"). To describe theproperties of the neural impulse series which are important fordirectional hearing, in a first approximation it is sufficient toprovide a simple receptor-neuron model of halfway rectification andsubsequent low-pass filtering for each frequency group. The halfwayrectification takes into consideration that the hair cells only respondto a half-wave of the basilar membrane oscillation. The low-pass offirst order, having a limiting frequency of approximately 800 Hz, servesto obtain the envelope, and takes into consideration that, with risingfrequency, the activity probability of the neurons follows the envelopemore and more and the fine structure of the basilar membrane less andless, whereas conversely, proportionality exists in the range of lowfrequencies.

Peripheral Stage C:

The nonlinear stage C formed from a division unit and a fed-back RCelement effects a dynamic compression for stationary signals, whilerapid changes in the input signal are extensively transmitted in linearfashion. Because of this, masking effects (particularly later masking)are simulated over time; furthermore, the stage C is likewise primarilyresponsible for important binaural time effects (e.g. "law of the firstwave front," "Franssen effect"). The time constant for the drop of theenvelope is on the average approximately 50 ms; conversely, for therise, 1.25 ms=1/800 Hz. Because of this, a high reduction in informationresults.

Peripheral Stage D:

The output signals of stage C are subjected to a time-frequencytransformation (e.g. Fast Fourier Transformation) in peripheral stage Dthat represents the spectral properties of the envelope. They changerelatively slowly, which corresponds to the changes in the activitystates in the acoustic cortex. The frequency-selective output signalsare present at the same time as a pattern of the binaural processingstage (binaural processor).

Binaural Processor:

For frequency-selective processing in the binaural processor, thefollowing is assumed:

1. The evaluation is effected by means of pattern-recognitionmechanisms, i.e. by means of comparison of the actual signal patternwith a supply of stored (learned) patterns. This is a process in whichan actual signal pattern is linked with a stored pattern, even when onlyparts of the stored pattern are included in the actual signal pattern,or when an only incomplete actual signal pattern is present.

2. A prerequisite for the recognition of spatial properties is coherenceof contralateral envelope signals ("binaural envelope coherence"), thatis, individual envelope signals from the left and right ear that havethe same frequency position represent pattern elements that areevaluated with regard to their binaural information.

3. The binaural information obtained in this manner is then allocated tothe corresponding, monoaural information obtained in a separatepattern-recognition process (location- and shape-determining patternrecognition).

Selection of the Stereophonic Information

With respect to stereophonic coding, the model according to FIG. 7 firstmakes the important statement that the selection of the stereophonicinformation is possible when the evaluation takes place in afrequency-selective manner, i.e. when a separation into subbands (ifpossible, over the entire frequency group) is provided. The propertiesof the frequency-selective signals at the outputs of the stages D andthe properties of pattern recognition give indications for coding of thestereophonic additional signals in the subbands.

Accordingly, the envelopes of all subband signals are extremelysignificant. In the range of higher frequencies (above approximately 1.5kHz), they do not permit evaluation of the interaural phase differences.This is practical, because above approximately 1.5 kHz the interauralphase difference no longer represents an unequivocal directionalfeature. Lateralization experiments show that, in the entire audiofrequency range (from approximately 150 Hz), interaural level and timedifferences of the envelopes lead to a lateral deflection of the audioevent, but this happens in interaural time differences of the signalfine structures only in the frequency range of up to approximately 1.5kHz. With amplitude-modulated high-pass noise, one then finds a fusionedaudio event when interaural, uncorrelated noise is used, provided thatonly the two ear signals have the same envelope.

This subband envelope evaluation means, with respect to stereophoniccoding, that solely the stereophonic information of the upper subbandsignals is represented by its envelopes, and that, as a consequence, thestereophonic signals can be combined at higher frequencies when theenvelopes of the upper subband signals have been extracted in advance.This stereophonic information is suited for reconstructing asubjectively sufficient channel partitioning in the decoder in that themonophonic subband signals are modulated with the original envelopes.The result is stereophonic signals which are only distinguished withrespect to their envelopes, but not with respect to their signal finestructure, and which, however, assure an unimpaired stereophonic audioimpression. A prerequisite for this is the sufficiently precisereproduction of the subband envelopes. It has already been proven that,even with crude temporal resolution, the resulting impairments toquality are relatively slight. They can be avoided entirely when thereduction of the channel partitioning is only effected intermittentlyand particularly more frequently in the uppermost frequency range thanbeneath it.

The temporal effects which occur in the binaural processor and theirevaluation during stereophonic coding are to be observed below. Thebehavior over time of the signals at the outputs of the stage D and theprocess of pattern recognition cause limits of the binaural timeanalysis which are expressed in three effects:

The "Franssen effect" says that two spatially separate noise eventsproduce a single audio event at the location of that loudspeaker whichemits the leading (pulse-affected) audio signal. The delayed audiosignal is irrelevant for directional allocation.

The "inertia effect" states that changes in direction in predominantlystationary sounds are probable as of a certain duration of change.Therefore, a stereo signal can be switched, for example, to mono for ashort time ("soft") without causing interference of the stereophonicsound image.

A common feature of all three effects that can be determined is that aseries of short-time noise events is irrelevant with respect to spatialperceptibility, and is therefore unnecessary as spatial information.These irrelevant noise events are even longer the more narrow-bandedthey are. The position over time of these irrelevant noise events oraudio signal segments can be determined by way of the subbandsignal-envelope courses; in a first approximation they lie directlybehind their leading edges.

Features of the Method

The combiner 31 in the encoder (FIGS. 3 and 5) and the decombiner 41 inthe decoder (FIGS. 4 and 6) permit the performance or cancellation,respectively, of subband-wise and block-wise, arbitrary matrixings anddematrixings. Resulting from this are, on the one hand, advantageousoptions of data reduction and, on the other hand, options of thealternative use of the audio-transmission channels, for exampletransmission of alternative language, as a function of the selection ofthe system configuration by means of the input 36 (FIGS. 3 and 5).

System Configuration A

This system configuration represents the data reduction forfive-channel, stereophonic signals, i.e. it is

S3=C

S4=LS

S5=RS.

The combiner 31 in the encoder according to FIGS. 3 and 5 matrixes thesignals L, R, C', LS', RS' according to the equations (1) through (5),while in the decombiner 41 of the decoder according to FIGS. 4 and 6,dematrixing takes place according to equations (6) through (10). Thedata reduction takes place through the following measures:

1. Dynamic Cross-Talk

A high data reduction for the additional signal packet T3/T4/T5 can beachieved in that, in accordance with a suitable signal analysis in theanalysis stage 35, dematrixing is canceled block-wise in individualsubbands. This takes place with the aid of a block-wise, so-called"puncture-coding" of subband signals of the additional signal packetT3/T4/T5 (equations (3), (4), (5)). In puncture coding the samplingvalues of the corresponding subband signal are set to zero in thecombiner 31 on the side of the encoder, and/or the quantizer 33according to the control signal COS and/or the bit allocationinformation BAL. On the side of the decoder, the missing sampling valuesin the subbands are substituted by transmitted sampling values in thesubbands of other audio channels, but of the same frequency positionaccording to the control signal COM. The substituted sampling values aresubsequently weighted in the stages 48 such that an adaptation to theoriginal subband signal level conditions is effected. To perform thisadaptation, the weighting factor GEW (FIGS. 3 and 5) is calculated inthe control unit 45 from the scale factors SCF and the combiner modeCOM, for example by means of interpolation of sequential scale factors,to prevent "hard" level jumps.

The rules for the application of puncture coding are the result ofdifferent hearing properties described in detail above. Essentially,puncture coding takes place in such a way that the information relevantfor binaural signal processing remains in the additional signal packetT3/T4/T5. Hence, the individual signals combined in the stereo basesignal packet Lo/Ro can be allocated on the decoder side to the originalaudio channels so far that the stereophonic audio image is subjectivelyreconstructed.

The audio signals are analyzed in the analysis stage 35 with regard tothe following puncture-coding criteria:

subband signal dynamics

According to the "Franssen effect" and the "law of the first wavefront," subband signal packets that follow a transient and behave in astationary or quasi-stationary manner can be puncture-coded.

envelope evaluation

The evaluation of the envelopes of the subband signals in the binauralmodel according to FIG. 7 permits more frequent execution ofpuncture-coding in detail bands of a higher frequency position than inthe subbands of lower frequency position.

signal energy conditions

The energy of a subband can be derived from its scale factors SCF. Thesummation of the signal energies of all of the subbands results in thesignal energy of the entire audio signal. A further criterion forpuncture encoding can be derived from the comparison of the signalenergy of the individual subbands with the signal energy of the entireaudio signal. And puncture encodings can take place more frequently inthose subbands in which the signal energy is relatively low with respectto the signal energy of the entire audio signal. "Relatively low" is tobe understood as, for example, a ratio of 1/i, where i is the number ofsubbands per audio channel.

inertia effect of human hearing

As already mentioned above, the "inertia effect" of human hearing is tobe understood as changes in direction in predominantly stationary soundsbeing perceptible as of a certain duration of change. Two options ofpuncture coding result:

a) If only a short-time change in direction is detected in the decoder,the subbands responsible for it can nevertheless be puncture-coded.

b) When puncture codings only take place for a short time, for examplefor reasons of a short-time overload of the transmission channel for thedata stream MUX, the hearing can perceive the interferences in thespatial image caused by this.

partial adjacent-channel masking

The above-mentioned effect of "adjacent-channel masking" can not only beemployed to establish quantization or to suppress individual subbandsignals--as will be described in detail below--but also used as afurther puncture-coding criterion. For example, such subbands that areonly partly, that is, not completely masked by adjacent channels arepuncture-coded sooner than such subbands that are not subjected toadjacent-channel masking.

utilization of redundancy

Because of the compatibility matrixing according to equations (1)through (5), intermittent, individual subband signals of T1 and/or T2are completely or nearly identical to corresponding subband signals ofthe same frequency position of T3, T4 and/or T5. Because of thisredundancy, the transmission of identical subband signals of the samefrequency position in the additional signal packet T3/T4/T5 can beomitted. On the decoder side, the puncture-coded subband signals of theadditional signal packet T3/T4/T5 are substituted only in their originalaudio channel during recombination. If, for example, a subband signal ofthe center channel S3 in the additional signal packet T3/T4/T5 ismissing, on the decoder side it must be ensured that, from the left orright signal of the stereo base signal packet Lo/Ro, substitution onlytakes place in the center playback channel S3', but not in the two otherplayback channels S4' and S5' (FIG. 4). Moreover, the center componentsin the stereo base signals Lo' and Ro' must be suppressed by thedecombiner 41, because, due to the missing center channel, dematrixingaccording to equations (6) through (8) is no longer possible.

alias distortions

As a consequence of the incomplete reconstruction of the audio signal inthe decoder as stipulated by puncture coding, the extinction of thealias distortions is not performed completely during back-filtering inthe synthesis filter banks 40 (FIGS. 4 and 6). These remaining aliasdistortions can be predetermined in the encoder, so that their maskingcan be calculated. No puncture coding can take place without masking.

negative correlation

If a negative correlation occurs, that is, phase opposition, betweensubband signals of the same frequency position, the consequence of thisin matrixing according to equations (1) and (2) is that extinctionsoccur in the left or right signal of the stereo base signal packetLo/Ro. These extinctions are canceled during dematrixing according toequations (6) and (7). This cancellation ceases, however, when one ofthe signals T3. T4 or T5 is not transmitted as a consequence of apuncture coding. For this reason, no puncture coding may take placeduring determination of a negative correlation.

2. Common Perceptual Threshold

The bit allocation of the subband signals in the individual audiochannels is calculated on the basis of the signal/masking interval inthe consideration of all audio channels and all i subbands. Thesignal/masking interval is to understood as minimal perceptual thresholdin a subband.

It is therefore necessary to determine the maximum signal level for eachof the audio channels and the minimum perceptual threshold per subbandand per temporal block. The minimum perceptual threshold is calculatedin the analysis stage 35 (FIGS. 3 and 5) from a Fourier analysis whichis switched in parallel with the subband filtering and is followed by apsychoacoustic model. This parallel concept has the advantage that aninadequate frequency separation of the filter bank 30 can thus becompensated, because on the one hand a good spectralresolution--determined by the Fourier analysis--is necessary fordetermining the spectral perceptual threshold and, on the other hand, agood time resolution--determined by the filter bank 30--of theindividual audio channels is given.

Because of this parallel concept, frequencies and levels of aliascomponents can also be determined, which is particularly important withregard to the operations performed by the combiner 31 (FIGS. 3 and 5).The decisions of this special signal analysis in 35 are also insertedinto the control signal COS, which controls the combiner 31.

Experiments have shown, for example, that approximately 24 ms sufficefor the temporal block for calculating the perceptual threshold; thiscorresponds to 1152 input PCM values at a sampling frequency of theaudio channels of 48 kHz, and represents a good compromise betweentemporal series of the audio signal and/or structural complexity of theanalysis stage 35.

The calculation of the more precise frequency resolution is, however,only necessary on the side of the encode, not in the decoder, which isimportant for mass-production of decoders.

The calculation of the common perceptual threshold and thesignal/masking interval derived therefrom is based on the followingsteps:

1. Step

Calculation of the Fourier transformation for time/frequencytransformation of the individual audio channel signals.

2. Step

Determination of the signal level in each subband for each audio channelsignal.

3. Step

Determination of the resting threshold, where an additional margin ofsafety is employed to increase the transmission dynamics.

4. Step

Determination of the tonal components, that is, the componentsrepresented by a pure sound, and the noisier components of a audiochannel in order to have available a difference in tonal maskers andnoise maskers.

5. Step

Reduction in the tonal and noisy components to components that areacousto-physiologically relevant by taking into consideration the mutualmasking of these components and masking by the resting threshold.

6. Step

Calculation of the individual perceptual thresholds of the relevantcomponents of the tonal and noisy components.

7. Step

Determination of the global perceptual threshold per audio channel.

8. Step

Determination of the minimum perceptual threshold in each subband peraudio channel.

9. Step

Determination of the maximum value of all minimum perceptual thresholdsin subbands of the same frequency position of the unmatrixed audiochannel signals L, R, C, LS, RS.

10. Step

Consideration of the effect of a reduced masking in spatially-separatedmasking ("Masking Level Difference"=MLD) in the determination of themaximum value according to Step 9 in the sense of a reduction in thedetermined maximum value.

11. Step

Consideration of an expansion of the audible zone in the determinationof the maximum value according to Steps 9 and 10. Because thecalculation of the MLD is essentially only valid for a single audiblelocation, the maximum value determined according to Steps 9 and 10additionally reduces the common perceptual threshold.

System Configuration B

This system configuration is represented on the one hand by a datareduction of a known type for the dual-channel stereophonic signals Land R and, on the other hand, by a data reduction that is completelyindependent thereof for channels S3, S4 and S5. The channels S3, S4 andS5 can have program contents that are independent of one another--forexample, they can be used strictly as comment channels; however, theycan also encompass a further stereophonic audio signal pair and anunused or an arbitrary other audio channel. In system configuration B nomatrixing of the audio channels takes place in the combiner 31, that is,the combiner 31 is functionless. The calculation of a common perceptualthreshold for the channels S3 through S5 no longer takes place in theanalysis stage 35; rather, for each individual channel S3 through S5,the individual perceptual threshold is calculated, from whichcorrespondingly different bit allocation information BAL is generatedfor the quantization stages 33. Furthermore, a data reduction accordingto the rules of dynamic cross-talk (puncture coding) does not take placein the analysis stage 35. The encoder according to FIG. 5, in which thedata stream MUX is merged into the data stream MUXo, is particularlyadvantageous for configuration B. All five audio channel signals whosebit flow respectively varies over time are inserted into a common, fixeddata frame. Because the bit flow fluctuations of the five audio channelsapproximately compensate in the statistical average, an improvedutilization of the transmission or storage capacity of the combined datastream MUXo* results.

On the decoder side, the decombiner 41 in the system configuration B canbe used as a channel selector and a channel mixer, depending on thecontrol of the control unit 45 by the playback configuration input 46.

Moreover, the analysis stage 35 of the encoder in system configuration Bproduces no difference scale factor values SCF; instead, the scalefactor values SCF of the channels S3 through S5 are further conducted tothe multiplexer 34, in coded form as the case may be. On the encoderside, the scale factors SCFo are ignored by the control unit 45.

What is claimed is:
 1. A method of transmitting or storing digitalized,multi-channel audio signals, comprising the steps of:(a) encoding themulti-channel audio signals, step (a) including(1) for each audiochannel, generating a number of spectral subband signals; (2) analyzingthe subband signals to obtaina dynamic control signal which is orientedtoward a binaural, psychoacoustic model, combiner mode information whichis derived from the dynamic control signal, bit allocation information,and for each subband signal, a scale factor that classes a peak value ofthe respective subband signal within a certain time block; (3) combiningthe subband signals of different audio channels, but which have the samefrequency position, to obtain combined subband signals, step (a-3) beingconducted in an interchannel manner according to the dynamic controlsignal; (4) normalizing the combined subband signals using the scalefactors to obtain normalized subband signals; and (5) quantizing thenormalized subband signals to obtain encoded subband signals inaccordance with a channel-specific, intra-channel perceptual thresholdof human hearing, step (a-5) being controlled by the bit allocationinformation; (b) transmitting or storing the encoded subband signals,the bit allocation information, the scale factors, and the combiner modeinformation; and (c) decoding and processing the encoded subbandsignals, step (c) including(1) decoding the encoded subband signalsaccording to the bit allocation information to obtain decoded subbandsignals; (2) decombining the decoded subband signals of different audiochannels, but the same frequency position, to obtain decombined subbandsignals, step (c-2) being conducted as a function of the combiner modeinformation; (3) denormalizing the decombined subband signals accordingto the scale factors, to obtain denormalized subband signals; and (4)generating a broadband digital audio signal from the denormalizedsubband signals.
 2. A method according to claim 1, wherein step (a)comprises effecting puncture coding with the aid of at least one of thedynamic control signal and the bit allocation information, so thatencoded subband signals for relevant subbands need not be transmitted orstored in step (b), wherein step (c) comprises substituting thenon-transmitted or non-stored encoded subband signals in the relevantsubbands by transmitted or stored encoded subband signals in thesubbands of other audio channels, but of the same frequency position,according to the combiner mode information, and wherein the substitutedencoded subband signals are adapted with respect to their level tooriginal level conditions of the relevant subbands.
 3. A methodaccording to claim 2, further comprising the step of calculatingweighting factors, with which the substituted encoded subband signalsare weighted, from the scale factors for the level adaptation.
 4. Amethod according to claim 2, wherein the puncture coding is effected asa function of a determination of a sequence of a transient and astationary signal state of the same audio channel or adjacent audiochannels.
 5. A method according to claim 2, wherein the puncture codingoccurs more frequently in subbands of higher frequency position than insubbands of lower frequency position.
 6. A method according to claim 2,wherein an entire audio signal is comprised of the multi-channel audiosignals, and the puncture coding occurs more frequently in subbands inwhich the signal energy is relatively low with respect to the signalenergy of the entire audio signal.
 7. A method according to claim 2,wherein the puncture coding is effected with the utilization of theinertia effect of human hearing.
 8. A method according to claim 2,wherein the puncture coding is effected with the utilization of thepsychoacoustic masking of adjacent audio channels.
 9. A method accordingto claim 2, wherein the puncture coding is effected with the utilizationof redundancy in subbands of the same frequency position of adjacentaudio channels, and wherein during step (c) the substituted encodedsubband signals are only substituted in their original audio channelduring recombination.
 10. A method according to claim 2, wherein thestep (a) further comprises the step of determining whether aliasdistortions exists, and wherein the puncture coding is only effectedwhen alias distortions are irrelevant with respect to properties ofhuman hearing.
 11. A method according to claim 2, wherein puncturecoding is not effected when a negative correlation exists betweensubband signals of the same frequency position.
 12. A method oftransmitting or storing digitalized audio signals for a plurality ofintercorrelated audio channels, the intercorrelated audio channelsincluding a left channel, a right channel, and at least one furtherchannel, said method comprising the steps of:(a) encoding the audiosignals, step (a) including(1) for each audio channel, generating anumber of spectral subband signals; (2) analyzing the subband signals toobtainbit allocation information which is derived from a commonperceptual threshold, the common perceptual threshold being formed fromall of the audio channels, and for each subband signal, a scale factorthat classes a peak value of the respective subband signal within acertain time block; (3) normalizing signals which are derived from thesubband signals, using the scale factors, to obtain normalized subbandsignals; and (4) quantizing the normalized subband signals to obtainencoded subband signals in accordance with a channel-specific,intra-channel perceptual threshold of human hearing, step (a-4) beingcontrolled by the bit allocation information; (b) transmitting orstoring the encoded subband signals, the bit allocation information, andthe scale factors; and (c) decoding and processing the encoded subbandsignals, step (c) including(1) decoding the encoded subband signals as afunction of the bit allocation information to obtain decoded subbandsignals; (2) denormalizing signals derived from the decoded subbandsignals according to the scale factors, to obtain denormalized subbandsignals; and (3) generating a broadband digital audio signal from thedenormalized subband signals.
 13. A method according to claim 12,wherein the common perceptual threshold is calculated with considerationof a masking difference in playback.
 14. A method according to claim 12,wherein the at least one further channel includes a center channel, aleft surround channel, and a right surround channel.
 15. A method oftransmitting or storing digitalized, multi-channel audio signals,comprising the steps of:(a) encoding the multi-channel audio signals,step (a) including:(1) for each audio channel, generating a number ofspectral subband signals; (2) analyzing the subband signals to obtainbitallocation information, and for each subband signal, a scale factor thatclasses a peak value of there spective subband signal within a certaintime block; (3) combining the subband signals of different audiochannels, but which have the same frequency position, to obtain combinedsubband signals, step (a-3) being conducted as a function of an audiorepresentation format of the audio channel corresponding to a systemconfiguration; (4) normalizing the combined subband signals using thescale factors to obtain normalized subband signals; and (5) quantizingthe normalized subband signals to obtain encoded subband signals inaccordance with a channel-specific, intra-channel perceptual thresholdof human hearing, step (a-5) being controlled by the bit allocationinformation; (b) transmitting or storing the encoded subband signals,the bit allocation information, the scale factors, and a control signalwhich is derived from the audio representation format; and (c) decodingand processing the encoded subband signals, step (c) including(1)decoding the encoded subband signals according to the bit allocationinformation to obtain decoded subband signals; (2) decombining thedecoded subband signals of different audio channels, but the samefrequency position, to obtain decombined subband signals, step (c-2)being conducted as a function of the control signal; (3) denormalizingthe decombined subband signals according to the scale factors, to obtaindenormalized subband signals; and (4) generating a broadband digitalaudio signal from the denormalized subband signals.